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VOtG B SPEECH RECOGNITION DEVICE FOR TOYS 
BACKGROUND OF THE INVENTION 

Field of the Invention 

[11 The present invention relates to a-veie e speech recognition device for toys for 
recognizing the voices speech of a number of unspecified people. 



Description of Related Art 

[21 In conventional-veiee speech recognition devices for toys, a vei-ee speech recognition 
device for toys for recognizing a specific person is designed to recognize words 
spoken by only one person, and the-yeiee speech of the speaker has to be registered in a 
RAM or a ROM before he or she actually uses the toy. Although the recognition rate of 
the ROM is not bad, there exist the following problems; no one other p e opl e than the 
registered person can use the toy, the registration of a user u^e-is required, and the 
registered-veiee speech is lost once the power is down. The veiee speech recognition device 
is not suitable for use in toys in particular for very young children. Among the 
problems it-is a critical problem that the-^mee speech recognition device allows only one 
person to use it, and therefore, the device limits the application thereof 

[31 On the other hand, a-veiee speech recognition device for recognizing-yeiee speech s^ of 
unspecified people is designed to recognize the^v^eiee speech of any person, and no 
registration is required before the device is actually used. However, it is required to input 
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voic e s speech of a number of people in a ROM in advance, and the initial production of 
vefe e speech data requires difficult v^ork. In addition, as the number of words or speeches 
to be recognized increases, the v^ork gets more complicated and the capacity of the memory 
for storing the increasing speech data has to be extended, high production costs 
thereby resulting. Japanese Examined Patent Publication No. 2-39798 discloses a 
related conventional example. In this conventional example, the length of an inputted 
veiee d spoken - word is measured, and when the measured length is determined to coincide 
with the length of the word determined by a-v^iee speech registration switch, a voic e 
speech is outputted. However, in a case where the length of a word is measured, there occur 
continuous malfunctions in a noisy place, and it is found that the device is not totally 
suitable for practical use. 

[41 In addition, although the device is designed to recognize words or speeches of 
unspecified people, it only can recognize in the order of ten to twenty words ofw 
speeches, and the device cannot recognize every word people speak. Due to this, the 
user has to consult with an owner'-s manual every time he or she wants to know what type 
of-veiee speech can be recognized, and in this sense theA^^ire e speech recognition device 
which has to originally be convenient is not convenient. 

[51 An object of the present invention is to provide a ^ejhee speech recognition device 
for recognizing voices the speech of a number of unspecified people using a 
microcomputer or a^veiee speech synthesis IC, wherein the length of a pause or pauses 
of two or more words is measured, and w hereby the voices are speech is recognized. 

[61 Another object of the present invention is to provide a voice speech recognition device 
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for recognizing voic e s the speech of a number of unspecified people, wherein the length 
in time of a word spoken by a speaker for recognition is measured, and w hereby the-veie e speech 
is recognized. 

[71 A further object of the present invention is to provide a-^^ek e speech recognition device 
for recognizing veiees- the speech of a number of unspecified people, wherein the length in time 
of a word spoken by a speaker is compared with the length in time of a corresponding-wiee 
speech synthesized word, and in the event that the result of the comparison falls within a 
predetermined tolerance, the word spoken by the speaker is recognized, and whereby the 
recognition of the word is effected. 

SUMMARY OF THE INVENTION 

[81 According to a first aspect of the present invention, there is provided a-veie e speech 
recognition device for toys comprising a storage means for measuring the length in time of a 
combination of two or more continuous words or expressions and the length in time of a pause 
or pauses between the words or expressions and then storing a measured value in advance, a 
control means for measuring the length in time of a word or expression spoken by a speaker, 
comparing a measured value with the measured value stored in the storage means, and 
recognizing the word or expression of the speaker in the event that the resuh of the 
comparison falls within a predetermined tolerance and an output means for outputting the 
result of the recognition so carried out. 

[91 According to the configuration, in addition to the recognition of the meaning of one 
word or expression, the meaning of a set of two or more words or expressions which are 
continuously spoken can be recognized from a combination of the two or more words or 




expressions and a pause or pauses between the words or phrases when they are so spoken. 
For example, in recognition of a combination of two expressions, when having 
recognized a combination of *'koimichiwa (hello) '* and "ii tenki desu (it is a good 
weather)," the first expression and the second expression are recognized together with 
a pause there between which should be provided when they are spoken, whereby the 
meaning of the combination of the two expressions, "konnichiwa, ii tenki desu 
(Hello, it is a good weather)" can be recognized. When people express, some people speak 
fast and others slowly, and therefore, one expression is made to be recognized in two 
ways, short and long. When first and second expressions are made to be recognized in two 
ways; short and long, respectively, the recognition of the two expressions can be made in four 
ways. Then, when a pause is provided between the first and second expressions, the 
pause is also made to be recognized in two ways; short and long, and as a result of this, 
the combination of the expressions can be recognized in eight ways, whereby a voice 
speech recognition device for toys with higher accuracy can be provided. 

[10] Thus, in the event that the length of either or any of two or more words or 
expressions differs from those stored in the storage means or the length of the pause 
between any two of the two or more words or expressions differs from that stored in the 
storage means, since the control means does not recognize the combination, there occurs 
no malfunction or mal-recognition. 

ri 1] hi addition, according to a second aspect of the present invention, there is provided a 
veiea speech recognition device for toys comprising a storage means for measuring the length 
in time of a word or expression spoken by a speaker for recognition and then storing a 
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measured value in advance, a control means for measuring the length in time of a word or 
expression spoken by a speaker, comparing a measured value with the measured value stored 
in the storage means and recognizing the word or expression of the speaker in the event 
that the result of the comparison falls within a predetermined tolerance and an output means 
for outputting in voice the result of the recognition so carried out. 

[121 According to the configuration, since the-^veiee speech recognition device is designed for 
use in toys for children, when a child as a player speaks to the-¥eiee speech recognition 
device, the device measures the length in time of the word or expression of the speaker, 
recognizes the word or expression of the speaker in the event that the result of the 
comparison falls within a predetermined tolerance, and outputs word or expression as in voice 
the result of the recognition via a device main body. For instance, in the case of a toy which 
is a model of a cat, when the player calls the name of the cat toy, "Tama", it answers the 
player by mewing. Thus, according to the present invention, an interactive^^oiee speech 
recognition toy like one described above can be provided. 

[13] Furthermore, according to a third aspect of the invention, there is provided a 
veie e speech recognition device for toys comprising a storage means for storing the length in 
time of a veie e speech synthesized word or expression in advance, an output means for 
outputting the-vme e speech synthesized word or expression and a control means for 
measuring the length in time of a word or expression spoken by a speaker, comparing a 
measured value with the length in time of the^wiee speech synthesized word or expression 
stored in the storage means, recognizing the word or expression of the speaker in the event 
that the result of the comparison falls within a predetermined tolerance and an outputting 
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means for outputting the result of the recognition. 

I" 141 According to the configuration, the conversation with a machine (IC) can be realized by 
making an IC execute both veiee speech synthesis and^veie e speech recognition, and 
moreover, the conversation with the machine can be realized at extremely low cost. 
For example, in a case where an expression, "ohayo (good morning) "is-veie e speech 
synthesized, if the length of an expression spoken by a speaker for recognition falls within a 
predetermined tolerance of the-veie e speech synthesized expression, "ohayo (good morning) 
the spoken expression can be recognized. The-v^iee speech synthesized expression 
"ohayo (good morning)" is provided with a predetermined tolerance in length; short and 
long, and therefore even if the expression is spoken fast or slowly, the length of the spoken 
expression falls within the predetermined tolerance, the expression can be recognized. 

[15] F urthermore, according to a fourth aspect of the present invention, there is provided a 
vei-e e speech recognition device for toys as set forth in the third aspect above, comprising a 
control means for measuring the length in time of a word or expression spoken by a speaker 
which corresponds to the outputted^wiee speech synthesized word or expression, comparing a 
measured value with the length in time of the-ve+ee speech synthesized word or expression 
which is stored in the storage means and recognizing the spoken word or expression of the 
speaker in the event that the result of the comparison falls within a predetermined 
tolerance, and an outputting means for outputting the recognized result. 

[16] According to the configuration, the-veiee speech recognition device measures the length 
in time of the word or expression spoken by the speaker which corresponds to the outputted 
¥eiee speech synthesized word or expression and recognizes the word or expression by 
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the speaker provided that the measured value falls within the predetermined tolerance. Namely, 
the player can enjoy a quiz by imaging a word or expression in association with the 
outputted-wie e speech synthesized word or expression. For example, here is a voic e 
speech synthesized question, "what is the highest mountain in Japan?" and if the player 
or speaker answers -Mt. Fuji," the answer is correct and is then recognized. 

[171 In addition, according to a fifth aspect of the present invention, there is provided 
a veiee speech recognition device for toys as set forth in the third or fourth aspect of the 
present invention, wherein the storage means stores the length in time of a combination of the 
length in time of the^veiee speech synthesized words or expressions and the length in 
time of a pause between the words or expressions in advance, wherein the control means 
measures the length in time of the pause between the words or expressions and the length in 
time of words or expressions spoken by the speaker, compares measured values with the length 
in time of the combination of the length in time of the pause stored in the storage means 
and the length in time of the words or expressions spoken by the speaker for recognition 
and recognizes the words or expressions by the speaker provided that the result of the 
comparison falls within the predetermined tolerance. 

[18] According to the configuration, since the length in time of the combination of the 
length of a blank time since the^v^ie e speech synthesized words or expressions have 
been outputted until the words or expressions spoken by the speaker for recognition are 
expressed and the length in time of the words or expressions by the speaker, a voice 
speech recognition with less errors can be attained. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



ri91 Fig. 1 is a diagram showing a principle according to the present invention for 
measuring the length in time of a word or expression, 

[201 Fig. 2 is a diagram showing another principle according to the present invention for 
measuring the length in time of a word or expression, 

[211 F ig. 3 is a diagram showing a further principle according to the present invention for 
measuring the length in time of a word or expression, 

[221 Fig. 4 is a diagram showing the configuration of hardware for use in the present 
invention, and 

[231 Fig. 5 is a diagram showing another configuration of hardware for use in the present 
invention. 

BEST MODE FOR CARRYING OUT THE INVENTION 

[24] With a view to describing the present invention in greater detail, a best mode for 
carrying out the invention will be described below with reference to the accompanying drawings. 

[25] Fig. 1 is a diagram showing a principle according to the present invention for measuring 
the length in time of a word or expression. Reference character A denotes the length of a 
word or expression, and for example, an expression, "konnichiwa (hello)" and an 
expression "konbanwa (good evening)" are an expression of five Japanese Hiragana 
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characters or five syllables and the length or the number of characters or syllables of 
the two expressions is the same. Reference numeral C also denotes the length of an 
expression such as -ii tenki desu (it is a good weather) " or "okaimono desu (Lm doing the 
shopping) " of seven Japanese Hiragana characters or syllables and therefore the length of 
the two expressions is identical. Reference character B is the length of a pause between 
the expression A and the expression C and denotes the length of interposed between the 
expression "Hello" and "It*s a good weather" when the speaker expresses, "Hello .... 
Its a good weather." 

[261 Thus, the two expressions are expressed continuously, and the meaning thereof is 
recognized by the combination of the expressions and the pause therebetween. Consequently, 
the recognition is effected in the event that the length of a combination of the expression 
A, the length of the pause B and the expression C falls within the tolerance of the 
length of a set expression. Additionally, the recognition is-effected even if a combination of the 
expression A and the expression C or a combination of the pause B and the expression C falls 
within the tolerance of the length of the set expression. The recognitions are totally processed by 
a microcomputer, which will be described later. 

[271 N ext, Fig. 2 is a diagram showing another principle according to the present 
invention for measuring the length in time of a word or expression. Reference character 
A* denotes the length of a^^^^eiee speech synthesized expression produced when a voice 
speech synthesized expression is expressed as an IC output such as an expression "ohayo 
(good morning)" of four Japanese Hiragana characters or syllables. In the figure, a lower 
line denotes a-¥eiee speech input, and reference character A is the length of a word or 
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expression corresponding to the^wie e speech synthesized sound A*, and if the lengths 
of A' and A are identical, then the sound by the speaker is recognized. Reference 
character B denotes a_pause between the^yeie e speech synthesized sound A* and the sound A 
produced by the speaker or since the-v^e4ee speech synthesized sound is outputted until the 
ve4e e speech is inputted for recognition and with this pause a-^vmee speech recognition 
with fewer errors can be provided. Namely, in the former case, the ve4 ^ speech 
recognition is effected based on the length A of the sound only, whereas in the latter 
case, the-^v^eiee speech recognition is effected based on the combination of the length B of 
the pause and the length A of the sound, and therefore the recognition capability can be 
increased with the latter case. Furthermore, it is possible to make the speaker imagine a 
word or expression following an initial^^^^eiee speech synthesized sound when the latter is 
expressed. For example, when a question "What is the highest mountain in Japan?" is asked in 
veiee speech synthesized sounds A*, if the speaker answers "Mt. Fuji" as sounds 
produced by the speaker, then the answer is recognized. Li this case, the speaker can 
operate the veie e speech recognition device without consulting the owner's manual every 
time. 

[28] Fig. 3 is a diagram showing a further principle according to the present invention 
for measuring the length in time of a word or expression. Reference character A denotes the 
length in time of a word or expression expressed by the speaker and is the length of a 
word "Tama (the name of a cat)» of two Japanese Katakana characters or syllables. In the 
figure, ajower line is a-yetee speech input by the speaker. 

[29] Reference character A' denotes a^yeiee speech synthesized sound produced 
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when a word or expression produced through-^veiee speech synthesis is expressed as an IC 
output, and when there is a-^ve4e e speech input "Tama," a synthesized-v^ie e speech 
"mew" is expressed in reply to the-^eie e speech input. In this case, the^yeie ^ speech input 
by the speaker is compared with a measured value stored in the storage part, and 
when the result of the comparison falls within a predetermined tolerance, the word of the 
speaker is recognized, and the result of the recognition is outputted in veie e speech . 

[30] Fig. 4 is a diagram showing the configuration of hardware for use in the present 
invention. In this embodiment, a microcomputer is used and onlv^v^oiee speech 
recognition is carried out. A-vefee speech signal having passed through a microphone 1 is 
amplified at an amplifier 2, thereafter an analog signal is converted into a digital signal 
at an integrating circuit 3, and the digital signal so converted is then inputted into a 
microcomputer 4. The microcomputer 4 comprises a storage part in which 
combinations of two or more continuous words or expressions, an operating part for 
recognizing a word or expression by the speaker when the length in time of the word or 
expression falls within a predetermined tolerance and a control part for outputting the 
result of the recognition. Thus, the digital signal inputted into the microcomputer 4 
is controlled at the control part, thereafter the control signal is received by the 
operating part, and an operating process is carried out at the operating part to see if the 
control signal and the first word or expression stored in the storage part can be recognized 
as the word or expression of the speaker, and when the control signal and the first word 
are determined to be recognized as the word or expression of the speaker as a result of 
the operating process, an LED or bulb is illuminated. The speaker can recognize that 
the first word has been recognized through the illumination of the LED or bulb and 
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times the input of a second word before the LED or bulb is switched off. A similar 
operating process to that carried out for the first word is performed, and when the 
second word is determined to be recognized, then the control part output an electric signal 
for driving a motor 5, illuminating or blinking a bulb 6 or activating an electromagnet 
7, whereby the arms, legs, eyes or mouth of a stuffed toy or doll can be activated, and 
at the same time a conversation with the toy can be realized. 

[311 Fig. 5 is a diagram showing the configuration of another hardware for use in the 
present invention. In this embodiment, m ^-ei m speech synthesis IC is used for voic e 
speech recognition. Normally, an inexpensive 4-bit or 8-bit microcomputer can be used for 
a-w+ee speech recognition microcomputer. However, in this embodiment, in order to 
further reduce the production cost, a normal-veie e speech synthesis IC is 
programmed for use for this purpose. When a switch SW of a main body is closed, 
the-v^eiee speech synthesis IC 4 controls such that a-wi-ee speech synthesized sound 
is amplified at an amplifier 8 and that the^wie e speech synthesized sound so amplified is 
then outputted through a speaker 9. When the output of the^v^eie e speech synthesized 
sound is completed, the LED or bulb is illuminated. The user times the expression of a 
word or expression corresponding to thQ-vmee speech synthesized sound through the 
microphone 1 before the LED or bulb is switched off. The user speaks a word or 
expression corresponding to the^roie e speech synthesized sound through the 
microphone 1 when he or she hears the^veiee speech synthesized word or expression. In this 
case, the user may speak the word or expression corresponding to the_-wieespeech 
synthesized sound instantaneously he or she hears the-¥ei^ speech synthesized 
sound, or the user may answer the-vete e speech synthesized sound after a certain pause since 
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the-veie e speech synthesized sound is outputted. These operations are totally 
processed through the program. The^wie e speech signal passing through the 
microphone 1 is amplified at an amplifier 2, and thereafter an analog signal is 
converted into a digital signal at an integrating circuit 3, the digital signal so converted 
being then inputted into the^veiee speech synthesis IC. When the word or expression 
having a length corresponding to the-wiee speech synthesized sound is inputted into the 
veie e speech synthesis IC, the result of the-veie e speech recognition by the program is 
outputted-ffl as anoth e r voice speech , whereby a motor 5 is driven, the bulb 6 is 
illuminated or blinked or the electromagnet 7 is activated. Thus, the arms, legs, eyes or 
mouth of a doll can be activated through a voiced order speech command , and at the 
same a conversation with the toy can be realized. 

[32] According to the present invention, it is possible to repeat by using the-^v^ele e speech 
synthesis IC a process in which an answer is given in response to a question from the 
computer, and this assumes a real conversation made between human beings, whereby 
the user can express his or her wishes in a sequential fashion. Finally, it is possible to 
make the microcomputer or-vo4e e speech synthesis IC to recognize many things to 
thereby make it follow orders from the user. 



IDUSTRIAL APPLICABILITY 
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[331 As has been described heretofore, according to the present invention, where the 
system of the invention is programmed in the microcomputer or veie a speech synthesis IC, 

[341 a veiee speech recognition device can be provided which is simple in configuration and 
inexpensive in cost, whereby a conversation with the computer can be reahzed. hi addition, 
according to the present invention, the recognition accuracy can be increased by combining a 
pluraHty of words or expressions and Umiting the length of each word or expression and 
the length in time of the pause between the words or expressions. Moreover, according 
to the present invention, in a case where the-ve4ee speech synthesis IC is used for voice 
speech recognition, it is possible to make the user to imagine the contents of a speech 
by the user or to make the-veie e speech recognition device speak the contents of a 
guidance, this obviating the necessity of an owner' s manual on how to use the voic e 
speech recognition device. Furthermore, according to the present invention, it is 
possible to make the microcomputer output in response to the result of a voice speech 
recognition for synchronization of actions other than a conversation. In addition, 
since the recognition is carried out based on the length of a sound, the-^v^^ee speech 
of any person can be recognized irrespective of sex, age or the like of the speaker. 
Additionally, since only data on the lengths of sounds are programmed, the memory 
capacity of the system can be reduced extremely, whereby a low priced product can be 
provided. In particular, in a case where a-wie e speech synthesis IC is used, an 
extremely low priced product can be provided. 

[351 addition, according to the present invention, no-vek e speech registration is 

required at all before use, and therefore the-v^ejee speech recognition device according to 
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the present invention can be used just after it is switched on. According to the present 
invention, ahhough voic e s the speech - of a number of unspecified people can be 
recognized, no voice speech data does not have to be collected. Furthermore, according to the 
present invention, the-veiea speech recognition device thereof is small in size and consumes 
very little power, and therefore a^wiee speech recognition device can be produced 
which is small in size and uses a small battery, a certain effectiveness in economy being 
thereby exhibited. 
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ABSTRACT 

[36] The present invention provides a veiee -speech recognition device for toys 

comprising a storage means for measuring the length in time of a combination of two or 
more continuous words or expressions and the length in time of a pause or pauses 
between the words or expressions and then storing a measured value in advance, a 
control means for measuring the length in time of a word or expression spoken by a 
speaker, comparing a measured value with the measured value stored in the storage means, 
and recognizing the word or expression of the speaker in the event that the result of 
the comparison falls within a predetermined tolerance and an output means for outputting 
the result of the recognition so carried out. 
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